(feel free to edit the paragraphs here!)
This year the International Day of Happiness was celebrated on March 20 with the theme “Happiness For All, Together” while (ironically) the world is coping with a stressful pandemic. But don’t let that get on your nerves too much - we are about look at some happiness-related data to (hopefully) get some clues as to why some countries are happier than others.
Specifically, we want to analyze and visualize the relationships between happiness score, suicide rates, and mental health access for all the countries in the recent years (2015-2019).
Data Wrangling
library(knitr)
library(tidyverse)
library(ggplot2)
library(viridis)
library(ggthemes)
library(plotly)
library(countrycode)
library(gridExtra)Describe our data sources: e.g. our five happiness datasets (2015-2019) are obtained from Kaggle (https://www.kaggle.com/unsdsn/world-happiness)…
happiness_2015 <- read.csv("https://raw.githubusercontent.com/Reed-Statistics/math241S20PostGrp4/master/2015.csv?token=ANDSWNJBOL4TOHNDMTCN2B26PZABE")
happiness_2016 <- read.csv("https://raw.githubusercontent.com/Reed-Statistics/math241S20PostGrp4/master/2016.csv?token=ANDSWNPJDSBHOCDJ3XFEZGK6PZADI")
happiness_2017 <- read.csv("https://raw.githubusercontent.com/Reed-Statistics/math241S20PostGrp4/master/2017.csv?token=ANDSWNLSUG5QK23B6O6CWXS6PZAFA")
happiness_2018 <- read.csv("https://raw.githubusercontent.com/Reed-Statistics/math241S20PostGrp4/master/2018.csv?token=ANDSWNJIBN65O6CBRQRR2726PZAFW")
happiness_2019 <- read.csv("https://raw.githubusercontent.com/Reed-Statistics/math241S20PostGrp4/master/2019.csv?token=ANDSWNO6U5A4PHJ3QJBMZTS6PZAGM")
mental_health_facilities <- read.csv("https://raw.githubusercontent.com/Reed-Statistics/math241S20PostGrp4/master/mental_health_facilities.csv?token=ANDSWNOJDJM2QZZHRZY6XRC6PZAIA")
suicide_death_rates <- read.csv("https://raw.githubusercontent.com/Reed-Statistics/math241S20PostGrp4/master/suicide-death-rates.csv?token=ANDSWNKAB4OYFH5KITURBOK6PZAJG")
#time_series_deaths <- read.csv("")First we needed to standardize the data across each year. We removed some columns, like region and the data authors’ dystopia measure, which were unhelpful for our analysis. Some years included error bounds on the country’s overall happiness score, while others didn’t. Furthermore, the names and orders of the variables changed from year to year. We inserted NAs where it was necessary and made sure each year had the same variables with easily workable names before combining them into one dataset.
lower_bound <- rep(NA, 158)
upper_bound <- rep(NA, 158)
happiness_2015 <- happiness_2015 %>%
mutate(year = 2015) %>%
select(1, 3, 4, 6:11, 13) %>%
mutate(lower_bound = lower_bound, upper_bound = upper_bound) %>%
rename(country = 1,
happiness_rank = 2,
happiness_score = 3,
gdp = 4,
family = 5,
life_expectancy = 6,
freedom = 7,
trust_corruption = 8,
generosity = 9,
year = 10,
lower_whisker = 11,
upper_whisker = 12)
happiness_2016 <- happiness_2016 %>%
mutate(year = 2016) %>%
select(1, 3, 4:12, 14) %>%
rename(country = 1,
happiness_rank = 2,
happiness_score = 3,
gdp = 6,
family = 7,
life_expectancy = 8,
freedom = 9,
trust_corruption = 10,
generosity = 11,
year = 12,
lower_whisker = 4,
upper_whisker = 5)
happiness_2017 <- happiness_2017 %>%
mutate(year = 2017) %>%
select(-12) %>%
rename(country = 1,
happiness_rank = 2,
happiness_score = 3,
gdp = 6,
family = 7,
life_expectancy = 8,
freedom = 9,
trust_corruption = 10,
generosity = 11,
year = 12,
lower_whisker = 4,
upper_whisker = 5)
lower_bound <- rep(NA, 156)
upper_bound <- rep(NA, 156)
happiness_2018 <- happiness_2018 %>%
mutate(year = 2018) %>%
mutate(lower_bound = lower_bound, upper_bound = upper_bound) %>%
rename(country = 2,
happiness_rank = 1,
happiness_score = 3,
gdp = 4,
family = 5,
life_expectancy = 6,
freedom = 7,
trust_corruption = 9,
generosity = 8,
year = 10,
lower_whisker = 11,
upper_whisker = 12)
happiness_2019 <- happiness_2019 %>%
mutate(year = 2019) %>%
mutate(lower_bound = lower_bound, upper_bound = upper_bound) %>%
rename(country = 2,
happiness_rank = 1,
happiness_score = 3,
gdp = 4,
family = 5,
life_expectancy = 6,
freedom = 7,
trust_corruption = 9,
generosity = 8,
year = 10,
lower_whisker = 11,
upper_whisker = 12)
happiness <- do.call("rbind", list(happiness_2015, happiness_2016, happiness_2017, happiness_2018, happiness_2019))We then added suicide rates and mental health access to the dataset. Unfortunately, mental health access was only collected in one year, so that value remains constant across the years in each country.
suicide_death_rates <- suicide_death_rates %>%
select(-2) %>%
rename(sdr = 3)
countries <- happiness %>%
left_join(mental_health_facilities, by = c("country" = "Country")) %>%
left_join(suicide_death_rates, by = c(c("country" = "Entity"),
c("year" = "Year"))) %>%
select(-13)So, how happy are countries overall? The World Happiness Report calculates each country’s happiness score by using survey data. Citizens of each country are asked where their own life ranks on a scale of 1 to 10, with 10 being the best life possible and 0 being the worst life possible. Their responses are weighted to generate a national score (See Table 1). Overall, people view their lives as being of average to slightly-above average quality. The wide range between the maximum and minimum are telling: people in the happiest countries are much, much happier than those in the least happy countries.
happiness_summary <- countries %>%
summarize(mean(happiness_score), median(happiness_score), sd(happiness_score), min(happiness_score), max(happiness_score))
options(knitr.table.format = "html")
kable(happiness_summary, digits = 2,
col.names = c("Mean", "Median",
"Standard Dev.", "Minimum", "Maximum"),
caption = "Table 1. Summary Statistics of National Happiness Scores")| Mean | Median | Standard Dev. | Minimum | Maximum |
|---|---|---|---|---|
| 5.38 | 5.32 | 1.13 | 2.69 | 7.77 |
A First Look at the World Happiness Ranking
We want to directly visualize the world happiness ranking using a diverging bar chart, which is great for illustrating the spread of negative and positive values.
First, we created a new dataframe avg_happiness that contains the average happiness score for every country from 2015 to 2019.
avg_happiness <- countries %>%
group_by(country) %>%
summarize(avg_happiness = mean(happiness_score)) %>%
as.data.frame()Next, we created a new column happiness_z that contains the normalized happiness score (\(z = (x-\mu)/ \sigma\)) for each country, and another column happiness_type to indicate which country is above or below 0 using ifelse().
avg_happiness$happiness_z <- round((avg_happiness$avg_happiness-
mean(avg_happiness$avg_happiness))/sd(avg_happiness$avg_happiness), 2)
avg_happiness$happiness_type <- ifelse(avg_happiness$happiness_z < 0, "below", "above")We also sorted the columns by the happiness score and converted the column country to factor to retain sorted order in plot.
avg_happiness <- avg_happiness[order(avg_happiness$happiness_z), ]
avg_happiness$country <- factor(avg_happiness$country, levels = unique(avg_happiness$country))Now, let’s visualize the diverging bars using geom_bar()!
happy_plot <- ggplot(avg_happiness, aes(x = country, y = happiness_z, label = happiness_z)) +
geom_bar(stat = "identity", aes(fill = happiness_type)) +
scale_fill_manual(name="Happiness Score",
labels = c("Above Average", "Below Average"),
values = c("above"="#DB7093", "below"="#D8BFD8")) +
coord_flip() +
theme_fivethirtyeight() +
labs(title = "Average of World Happiness 2015-2019")
happy_plotChange in world happiness and suicide rate from 2015 to 2017
Then, we wanted to create a scatterplot to visualize (1) the change in world happiness and suicide rate from 2015 to 2017 by continent (we don’t have data for suicide rate after 2017) and (2) the relationship between happiness rating and suicide rate.
First, we used the library countrycode to create a new variable with the name of the continent to which each country belongs.
countries$continent <- countrycode(sourcevar = countries[, "country"],
origin = "country.name",
destination = "continent") Then, we used filter() to only include data from 2015, 2016, and 2017, and visualized the relationship between happiness and suicide using geom_point(). We also made use of the function ggplotly() from the interative graphing library plotly to add a slider that illustrates the change over the years.
# create a scatterplot to visualize the relationship between happiness and suicide
suicide_happiness_plot <- countries %>%
filter(year %in% c(2015, 2016, 2017)) %>%
ggplot(aes(x = happiness_score, y = sdr,
color = continent, frame = year, ids = country)) +
geom_point(aes(size = 1, alpha = 0.7)) +
scale_x_log10() +
labs(x = "Happiness Score", y = "Suicide death rate per 100,000",
size = NULL, color = "Continent") +
theme_minimal() +
scale_color_manual(values = c("#DEB887", "#CD5C5C", "#FFD700", "#ADD8E6", "#9ACD32"))
ggplotly(suicide_happiness_plot)# create a scatterplot that shows the relationship between mental health facilities and happiness scores
# countries %>% ggplot(aes(x = happiness_score, y = Mental.hospitals..per.100.000.population.)) +
# scale_y_log10() +
# scale_x_log10() +
# geom_jitter(alpha = 0.5, color = "#BC8F8F")(we don’t have to use this graph - I’m just curious about the log transformed relationship)
Mental health access, happiness, and suicide
How does mental health access factor into happiness and suicide? We used the indiciator of mental health access with the fewest NAs, which is the number of mental health units in general hospitals per 100,000. Because this variable is highly right-skewed, we log it to better visualize the relationship.
The two plots reveal that mental health access appears to have a stronger relationship with happiness than with the suicide death rate. The error bar on the regression line for the suicide rate shows that the trend could be flat or even negative, rather than the small positive relationship seen here. The stronger relationship between mental health access and happiness is unsurprising. Greater access to mental healthcare may increase happiness, or another factor, such as a country’s wealth, may connect the two.
#side-by-side comparison of mental health access versus happiness/suicide rate
happiness_vs_health <- countries %>%
filter(year == 2016) %>%
ggplot(aes(x = Mental.health.units.in.general.hospitals..per.100.000.population., y = happiness_score), na.rm = TRUE) +
geom_point(color = "darkorchid2") +
geom_smooth(method=lm, color = "darkorchid4") +
scale_x_log10() +
theme_bw() +
labs(x = "(Log of) Mental Health Units in General Hospitals per 100,000",
y = "Happiness Score (2016)",
title = "Log of Mental Health access versus Happiness Score")
suicide_vs_health <- countries %>%
filter(year == 2016) %>%
ggplot(aes(x = Mental.health.units.in.general.hospitals..per.100.000.population., y = sdr), na.rm = TRUE) +
geom_point(color = "royalblue2") +
geom_smooth(method=lm, color = "royalblue4") +
scale_x_log10() +
theme_bw() +
labs(x = "(Log of) Mental Health Units in General Hospitals per 100,000",
y = "Suicide Death Rate (2016)",
title = "Log of Mental Health access versus Suicide Death Rate")
happiness_vs_healthsuicide_vs_healthWe also need a table somewhere in our blogpost.